A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

نویسندگان

  • Stéphane Ross
  • Geoffrey J. Gordon
  • J. Andrew Bagnell
چکیده

Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches (Daumé III et al., 2009; Ross and Bagnell, 2010) provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement and Imitation Learning via Interactive No-Regret Learning

Recent work has demonstrated that problems– particularly imitation learning and structured prediction– where a learner’s predictions influence the inputdistribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of acti...

متن کامل

No-Regret Methods for Learning Sequential Predictions Thesis Proposal

Sequential prediction problems arise commonly in many areas of robotics and information processing. For instance, in robot navigation tasks, autonomous robots rely on the ability to make a sequence of actions, given a sequence of observations revealed to them over time, in order to reach the desired goal location. Similarly, complex information processing tasks, such as structured prediction pr...

متن کامل

HC-Search: Learning Heuristics and Cost Functions for Structured Prediction

Structured prediction is the problem of learning a function from structured inputs to structured outputs. Inspired by the recent successes of search-based structured prediction, we introduce a new framework for structured prediction called HC-Search. Given a structured input, the framework uses a search procedure guided by a learned heuristic H to uncover high quality candidate outputs and then...

متن کامل

Learning to Search: Structured Prediction Techniques for Imitation Learning

Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, timeconsuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behavior is clear and easily demonstrated. Inspired by successful end-to-end learning systems such as ...

متن کامل

Multi-Armed Bandits on Unit Interval Graphs

An online learning problem with side information on the similarity and dissimilarity across different actions is considered. The problem is formulated as a stochastic multiarmed bandit problem with a graph-structured learning space. Each node in the graph represents an arm in the bandit problem and an edge between two nodes represents closeness in their mean rewards. It is shown that the result...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011